Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Optimum feature selection based on genetic algorithm under Web spam detection
WANG Jiaqing, ZHU Yan, CHEN Tung-shou, CHANG Chin-chen
Journal of Computer Applications    2018, 38 (1): 295-299.   DOI: 10.11772/j.issn.1001-9081.2017061560
Abstract415)      PDF (807KB)(305)       Save
Focusing on the issue that features used in Web spam detection are always high-dimensional and redundant, an Improved Feature Selection method Based on Information Gain and Genetic Algorithm (IFS-BIGGA) was proposed. Firstly, the priorities of features were ranked by Information Gain (IG), and dynamic threshold was set to get rid of redundant features. Secondly, the function of chromosome encoding was modified and the selection operator was improved in Genetic Algorithm (GA). After that, the Area Under receiver operating Characteristic (AUC) of Random Forest (RF) classifier was utilized as the fitness function to pick up the features with high degree of identification. Finally, the Optimal Minimum Feature Set (OMFS) was obtained by increasing the experimental iteration to avoid the randomness of the proposed algorithm. The experimental results show that OMFS, compared to the high-dimensional feature set, although the AUC under RF is decreased by 2%, the True Positive Rate (TPR) is increased by 21% and the feature dimension is reduced by 92%. And the average detecting time is decreased by 83%; moreover, by comparing to the Traditional GA (TGA) and Imperialist Competitive Algorithm (ICA), the F1 score under Bayes Net (BN) is increased by 4.2% and 3.5% respectively. The experimental results that the IFS-BIGGA can effectively reduce the dimension of features, which means it can effectively reduce the calculation cost, improves the detection efficiency in the actual Web spam detection inspection project.
Reference | Related Articles | Metrics